Annotate Images for Object Detection

This topic describes how to label features of interest with rectangle annotations, for training object detection models.

Use the Deep Learning Labeling Tool to define your output classes and to manage project files, annotations, and training rasters. It automatically creates object detection rasters once you define a set of annotations for each training raster. The object detection rasters are automatically passed to the training step. You do not have to take any extra steps to build object detection rasters.

See the following sections:

Start the Labeling Tool

Choose one of the following options to start the Labeling Tool:

Set Up a Deep Learning Project

Projects help you organize all of the files associated with the labeling process, including training rasters and annotations. They also keep track of class names and the values you specify for training parameters.

Define Classes

Use the Class Definitions section of the Labeling Tool to define the features you will label. The feature names determine the class names in the output classification image.

  1. Click the Add button in the lower-left corner of the Class Definitions section.
  2. Enter a Name for the first feature that you will label, then click the button. The feature is added to the Class Definitions list.
  3. Optional: Click in the Color column to change the color of the class.
  4. If you will be extracting multiple features, repeat Steps 1-3 as needed for additional classes/features.

Next, you will select the rasters to use for training.

Add Training Rasters

  1. Click the Add button below the Rasters section of the Labeling Tool. The Data Selection dialog appears.
  2. Select one or more training rasters and click OK.
    • All training rasters must have the same number of bands. If one image has a different number of bands than the others, you can select that image in the Data Selection dialog and click the Spectral Subset button. Then select the appropriate number of bands.
    • To use pre-trained model weights for object detection, click the Spectral Subset button and select exactly three bands for input. The pre-trained weights come from ImageNet, an open-source image database used for deep learning research. The benefit of pre-trained weights is faster training time and having a rigorously tested set of parameters that are ideal for object detection. If you do not spectrally subset the training rasters, all bands will be used, and the model will use randomly initialized weights.
    • The training rasters are listed in the Rasters section.
    • To remove all training rasters, click the Select All button , then the Remove Selected button .
  3. Optional: Select a training raster and click the Rename Selected button to provide a name that is more meaningful than its filename. You can only rename one raster at a time.

Now you are ready to label the features in your training rasters using rectangle annotations.

Draw Rectangles

For each training raster, the "Classes" column shows the fraction of classes/features drawn to the total number of classes. Follow these steps to draw rectangle annotations around features.

  1. Select a training raster from the list and click the Draw button . The training raster is displayed.

  2. To the right of the Draw button is a drop-down list with the classes you have defined. Select the class to label. The cursor changes to a crosshair symbol.

  3. Click and drag the cursor to draw a box around a feature for the specified class. Here are some drawing tips:

    • When features are displayed at an angle, you do not have to rotate the image to make them horizontal or vertical. Keep the rotation angle at 0 degrees and draw rectangles around the entire rotated object; for example:

    • Use the cyan-colored selection handles to resize the rectangle.
    • To accept a rectangle that has been drawn, click outside of it. The selection handles disappear.
    • To delete a rectangle that you just drew (with the selection handles visible), press the Del key on your keyboard. Or, click the Undo button in the ENVI toolbar to undo the last object drawn.
    • To delete an existing rectangle, click the Select button in the ENVI toolbar. Select the rectangle. When the selection handles appear, press the Del key.
    • To delete all rectangles for a given class, select the class name in the Class Definitions tab. Then click the Remove Selected Labels button.
  4. To label features for a different class, select that class in the Class Definitions tab.
  5. Draw rectangles around as many examples of the feature as possible within the training raster. For the most accurate results with object detection, try to label several hundred examples.
  6. To use a different training raster for drawing, select its name in the Labeling Tool and click the Draw button.
  7. Continue labeling the image in this manner for each feature/class. Your progress is automatically saved to the current project, even if you close the Labeling Tool.

Note: The Labeling Tool synchronizes the names and colors of classes and their associated annotations. Do not change the names and colors of annotations outside of the Labeling Tool.

Optional: Import Annotations

Use the Import Annotations option to import rectangle annotations created in one project into a new project so that you do not have to redraw them in the new project.

Note: The Import Vectors option is disabled for Object Detection projects. It only applies to Pixel Segmentation projects.

Follow these steps:

  1. Define classes, as described earlier.
  2. Add training rasters, as described earlier.
  3. In the Deep Learning Labeling Tool, select the training raster for which you want to import an annotation file.
  4. In the Deep Learning Labeling Tool, click the Options button and select Import Annotations. The Select Annotation Filename dialog appears.
  5. Navigate to a deep learning project directory for which you previously labeled training rasters with rectangle annotations. Within this directory is a subdirectory containing the label rasters. Select the annotation file named annotations.anz in this subdirectory, and click Open. The Match Input Annotations to Class Definitions dialog appears. The Input Annotations list on the left side of the dialog shows the input annotation classes.
  6. Click on an item in the Input Annotations list, then click on the matching class in the Class Definitions list. A line is drawn between the two items. Repeat this step with the remaining classes.
  7. Click OK. The annotations are imported into the deep learning project.

Optional: View Project and Labeling Statistics

Click the Options drop-down list and select Show Labeling Statistics to view information about the current project and number of labels.

The top of the Project Statistics dialog provides general information about the project, including:

To save the project statistics to a text file, click the Save button in the Project Statistics dialog. Then select a location to save the text file. The default filename is report.txt.

To copy the project statistics to your system's clipboard, click the Copy button in the Project Statistics dialog.

Optional: Create Label Rasters Without Training

You can create object detection rasters (also called label rasters) without proceeding with training. To do this, click the Options drop-down list and select Generate Label Rasters. ENVI automatically creates object detection rasters for training and populates the "Label Raster" column with "OK" for each training raster.

Train the Object Detection Model

Once you have labeled all of your training rasters with rectangle annotations, the next step is to train the deep learning model. The Labeling Tool provides a simplified way to train models.

Follow these steps:

  1. Click the Train button in the Labeling Tool. The Train Deep Learning Model dialog appears.
  2. Use the Training/Validation Split (%) slider to specify the percentage of data to use for training versus validation.
  3. Enable the Shuffle Rasters option to shuffle the training and validation data before splitting them. This ensures that the training is not biased. Disabling this option means that the same images will be used for training and validation with each training run, which will help achieve more repeatable results.
  4. Enable the Pad Small Features option when features are small; for example: vehicles, utilities, road markings, etc. Features with bounding boxes drawn around them must have at least 25 pixels in the X and Y directions. If the labeled features are smaller than this, the Pad Small Features option will pad them with extra pixels so they are at least 25 pixels in both directions.

  5. Set the Augment Scale option to Yes to augment the training data with resized (scaled) versions of the data. See Data Augmentation.
  6. Set the Augment Rotation option to Yes to augment the training data with rotated versions of the data. See Data Augmentation.
  7. Optional: In the Number of Epochs field, enter the number of epochs to run. Training parameters are adjusted at the end of each epoch. The default value is 25.
  8. Optional: In the Patches per Batch field, specify the number of patches to run per batch. A patch is a small image subset passed to the trainer to help it learn what a feature looks like. The default patch size used for object detection is 640 x 640 pixels. A batch comprises one iteration of training; model parameters are adjusted at the end of each iteration. Batches are run in an epoch until the number of patches per epoch is met or exceeded. The default value is 1.

    The Patches per Batch parameter controls how much data you send to the trainer in each batch. This is directly tied to how much GPU memory you have available. With higher amounts of GPU memory, you can increase the Patches per Batch.The following table shows the amount of GPU memory successfully tested with different values:

    GPU memory (MB)

    Patches per Batch

    5099

    1

    5611

    2

    9707

    3-4

    10731

    5-8

    11711

    9-10

  9. Optional: In the Feature Patch Percentage field, specify the percentage of patches that contain labeled features to use during training. Values should range from 0 to 1. This applies to both the training and validation datasets. The default value is 1, which means that 100% of the patches that contain features will be used for training. The resulting patches are then used as input to the Background Patch Ratio, described in the next step.

    Example: Suppose that an object detection raster has 50 patches that contain labeled features. A Feature Patch Percentage of 0.4 means that 20 of those patches will be used for training (20/50 = 0.4, or 40%).

    The default value of 1 ensures that you are training on all of the features that you labeled. In general, if you have a large training dataset (hundreds of images), lowering the Feature Patch Percentage will decrease training time.

  10. Optional: In the Background Patch Ratio field, enter the ratio of background patches (those that contain no labeled features) to patches with features. For example, a ratio of 1.0 for 100 patches with features would provide 100 patches without features. The default value is 0.15.

    When features are sparse in a training raster, the training can be biased by empty patches throughout. The Background Patch Ratio parameter allows you to restrict the number of empty patches, relative to those that contain features. Increasing the value tends to reduce false positives, particularly when features are sparse. The following image shows an example. Increasing the Background Patch Ratio to 0.25 and performing longer training (by increasing the Number of Epochs to 60) resulted in fewer false positives with identifying vessels. The vessels are sparse when compared to the rest of the image. Click on the thumbnail image to see the full image.

  11. Specify a filename (.h5) and location for the Output Model. This will be the "best" trained model, which is the model from the epoch with the lowest validation loss.
  12. Optional: Specify a filename (.h5) and location for Output Last Model. Most of the time, the best model will perform the best compared to the last model, but not always. Having both outputs lets you choose which model works best for your scenario.
  13. Click OK. ENVI automatically creates object detection rasters for training and populates the "Label Raster" column with "OK" for each training raster. Training a model takes a significant amount of time due to the computations involved. Depending on your system and graphics hardware, processing can take several minutes to several hours. A Training Model dialog shows the progress of training.

    At the same time, a TensorBoard page displays in a new web browser. TensorBoard is a visualization toolkit included with TensorFlow. It reports real-time metrics such as Loss, Accuracy, Precision, and Recall during training. See View Training Metrics for details.

When training is complete, you can pass the trained model to the TensorFlow Object Classification tool.

See Also

Object Detection, TensorFlow Object Classification